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Abstract. In the past decades, different approaches have been developed in order to link the 
physical properties of galaxies to the dark matter haloes in which they reside. In this review, I 
give a brief overview of methods, aims, and limits of these techniques, with particular emphasis 
on semi-analytic models of galaxy formation. For these models, I also provide a brief summary of 
recent successes and open problems. 
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INTRODUCTION 

During the last decades, a number of observational tests have converged to establish 
the ACDM model as the de facto standard cosmological model for structure formation. 
Over the past years, it has been shown that this model can account simultaneously for 
the present acceleration of cosmic expansion as inferred from supemovae explosions 
the structure seen in the z = 3 Lya forest [2], the power spectrum of low redshift 
galaxies f^, the most recent measurements of the microwave background fluctuations 
l4J, and a number of other important observational constraints. Although the candidate 
particle for the non-baryonic dark matter has yet to be detected in the laboratory, and 
the nature of dark energy remains unknown, the fundamental cosmological parameters 
are now known with uncertainties of only a few per cent, removing a large part of the 
parameter space in galaxy formation studies. 

While the basic theoretical paradigm for structure formation appears to be well estab- 
lished, our understanding of the physical processes that lead to the variety of observed 
galaxy properties is still far from complete. Although I have kept the word ab initio 
in the title of this review, as suggested by the organizers of this meeting, I would like 
to stress that ab initio treatments of the galaxy formation process are very difficult - if 
not unfeasible - simply because we do not have a complete understanding of the many 
different and complex physical processes which are at play. 

Today's models of galaxy formation find their seeds in the pioneering work by White 
and Rees ^ who proposed that galaxies form when gas condenses at the centre of dark 
matter haloes, following the radiative cooling of baryons. In the following years, three 
different approaches have been developed in order to link the observed properties of 
luminous galaxies to the dark matter haloes in which they reside. 
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In semi-analytic models of galaxy formation, which I will discuss in more detail in 
the following, the evolution of the baryonic components of galaxies is modelled using 
simple yet physically and/or observationally motivated prescriptions. Modem semi- 
analytic models take advantage of high-resolution N-body simulations to specify the 
location and evolution of dark matter haloes, which are assumed to be the birth-places 
of luminous galaxies. Since pure N-body simulations can handle very large number 
of particles, this approach can access very large dynamic ranges in mass and spatial 
resolution. In addition, the computational costs are limited so that the method allows a 
fast exploration of the parameter space and an efficient investigation of different specific 
physical assumption. 

Direct hydrodynamical simulations provide an explicit description of gas dynamics. 
As a tool for studying galaxy formation, it is worth reminding that these methods are still 
limited by relatively low mass and spatial resolution, and by computational costs that are 
still prohibitive for simulations of galaxies throughout large volumes. In addition, and 
perhaps more importantly, complex physical processes such as star formation, feedback, 
etc. still need to be modelled as sub- grid physics, either because the resolution of the 
simulation becomes inadequate or because (and this is almost always true) we do not 
have a 'complete theory' for the particular physical process under consideration. 

A third approach - usually referred to as the Halo Occupation Distribution (HOD) 
models - has become popular in more recent years. This method essentially bypasses any 
explicit modelling of the physical processes driving galaxy formation and evolution, and 
specifies the link between dark matter haloes and galaxies in a purely statistical fashion. 
The method is conceptually very simple and easy to implement, and it can be constrained 
using the increasing amount of available information on clustering properties of galaxies 
at different cosmic epochs. It remains difficult, however, to move from a purely statistical 
characterization of the link between dark matter haloes and galaxies to a more physical 
understanding of the galaxy formation process itself. 

Clearly, each of these methods has its own advantages and weaknesses, and they 
should be viewed as complementary rather than competitive. In the following, I will 
focus on semi-analytic models of galaxy formation. In particular, I will provide a brief 
overview of the methods, aims, and limits of these techniques, and give a brief summary 
of their recent successes and open problems. 

METHODS, AIMS, AND LIMITS 

The backbone of any semi-analytic model is provided by what in the jargon is called a 
dark matter 'merger tree', which essentially provides a representation of the assembly 
history of a dark matter halo. Early renditions of semi-analytic models - but this is still 
the case for a large number of applications today - took advantage of the extended Press- 
Shechter (EPS) formalism f^, 7] and Monte Carlo methods to construct representative 
histories of merger trees leading to the formation of haloes of a given mass. It is 
important to note that some recent work has demonstrated that this formalism might not 
provide an adequate description of the merger trees extracted directly from numerical 
simulations [[si&Ii^]- Although some of these studies have provided 'corrections' to 
analytic merger trees, many applications are still carried out using the classical EPS 



formalism, and little work has been done so far to understand to which measure this can 
affect predictions of galaxy formation models. 

As mentioned earlier, modem semi-analytic models (sometimes referred to as hybrid 
models) take advantage of high-resolution N-body simulations to follow the evolution 
of dark matter haloes in its full geometrical complexity juL 121. On a next level of 
complexity, some more recent implementations have explicitly taken into account dark 
matter substructures, i.e. the haloes within which galaxies form are still followed when 
they are accreted onto a more massive system rtl3l. ll4|l . There is one important caveat to 
bear in mind regarding these methods: dark matter substructures are fragile systems that 
are rapidly and efficiently destroyed below the resolution limit of the simulation 
Since the baryons are more concentrated than dark matter, it is to be expected that the 
baryonic component will be more resistant to the tidal stripping that reduces its parent 
halo mass. This creates a complex and strong position-dependent relation between dark 
matter substructures and galaxies, contrary to what was assumed in early HOD models. 
In addition, this treatment introduces a complication due to the presence of 'orphan 
galaxies', i.e. galaxies whose parent substructure mass has been reduced below the 
resolution limit of the simulation. In most of the available semi-analytic models, these 
galaxies are assumed to merge onto the corresponding central galaxies after a residual 
merging time which is given by some variation of the classical dynamical friction 
formula. Only a few models account for the stripping of stars due to tidal interactions 
with the parent halo. 

Once the backbone of the model is constructed, using either N-body simulations or 
analytic methods, galaxy formation and evolution is 'coupled' to the merger trees us- 
ing a set of analytic laws that are based on theoretical and/or observational arguments, 
to describe complex physical processes like star formation, supemovae and AGN feed- 
back processes, etc. Adopting this formalism, it is possible to express the full galaxy 
formation process through a set of differential equations that describe the variation in 
mass of the different galactic components (e.g. gas, stars, metals). Given our limited 
understanding of the physical processes at play, these equations contain/ree parameters 
whose value is typically chosen in order to provide a reasonably good agreement with 
observational data in the local Universe. 

One common criticism to semi-analytic models is that there are too many free pa- 
rameters. It should be noted, however, that the number of these parameters is not larger 
than the number of published comparisons with different and independent sets of ob- 
servational data, for any of the semi-analytic models discussed in the recent literature. 
In addition, these are not 'statistical' parameters but, as explained above, they are due 
to our lack of understanding of the physical processes considered. A change in any of 
these parameters has consequences on a number of different predictable properties, so 
that often there is little parameter degeneracy for a given set of prescriptions. Finally, 
observations and theoretical arguments often provide important constraints on the range 
of values that different parameters can assume. 

One great advantage of hybrid methods with respect to classical techniques based 
on the EPS formalism, is that they provide full dynamical information about model 
galaxies. Using this approach, it becomes possible to construct realistic mock catalogues 
that contain not only physical information about model galaxies such as masses, star 
formation rates, luminosities, etc. but also dynamically consistent redshift and spatial 



information, like in real redshift surveys. Using these mock catalogues, it is then possible 
to carry-out detailed comparisons with observational data at different cosmic epochs. 
These comparisons provide useful information on the relative importance of different 
physical processes in establishing a certain observational trend, and on the physics which 
is eventually missing in these models. 



RECENT SUCCESSES AND OPEN PROBLEMS 

In discussing recent successes of semi-analytic models, I will start from the most funda- 
mental description of the galaxy population: the galaxy luminosity function. Since early 
implementations of semi-analytic techniques, it was clear that a relatively strong super- 
novae feedback was needed in order to suppress the large excess of faint galaxies, due to 
the steep increase of low-mass dark matter haloes [17, 18]. It is interesting to note that 
matching the faint end of the luminosity function comes at the expenses of exacerbating 
the excess of luminous bright objects, due to the fact that the material reheated and/or 
ejected by low-mass galaxies ends up in the hot gas associated to central galaxies of 
relatively massive haloes. At later times, this material cools efficiently onto the corre- 
sponding central galaxies increasing their luminosities and star formation rates, at odds 
with observational data. 

Matching the bright end of the luminosity function has proved difficult for a long 
time, and a good match has been achieved only recently using a relatively strong form 
of 'radio-mode' AGN feedback [T^ 2^. Different prescriptions of AGN feedback have 



been proposed, and still much work remains to be done in order to understand if and 
how the energy injected by intermittent radio activity at the cluster centre is able to 
efficiently suppress the cooling flows. Recent observational measurements indicate that 
the ensemble-averaged power from radio galaxies seems sufficient to offset the mean 
level of cooling [ |2ll ] . It is, however, important to note that not every cluster shows central 
radio activity, and that the steep dependence of the radiative cooling function on density 
makes it difficult to stabilize cooling flows at all radii. 

The main reason for the success of the 'radio-mode' AGN feedback is that it is not 
connected to star formation, so that its implementation permits at the same time to 
suppress the luminosity of massive galaxies and to keep their stellar populations old [ i 22|l . 
Therefore these models seem to reproduce, at least qualitatively, the observed trend for 
more massive ellipticals to have shorter star formation time-scales. A good quantitative 
agreement has not been shown yet and is complicated by large uncertainties associated 
to star formation histories extracted from observed galaxy spectra. 

The suppression of late cooling (and therefore star formation) does not affect, how- 
ever, the assembly history of massive galaxies for which models predict an increase in 



stellar mass by a factor 2 to 4 since z ~ 1, depending on stellar mass [|22l . 12311 . This 
creates a certain tension with the observation that the massive end of the galaxy mass 
function does not appear to evolve significant over the same redshift interval. Part of 
this tension is removed when taking into account observational errors and uncertainties 
on galaxy mass estimates [|24l. l25l see also Monaco these proceedings]. For the mass 
assembly of the brightest cluster galaxies (BCGs), the situation is worse: while obser- 
vations seem consistent with no mass growth since z ~ 1, models predict an increase in 



mass by a factor about 4 [|23l. 12611. One major caveat in this comparison is given by the 
fact that observational studies usually adopt fixed metric aperture magnitudes (which ac- 
count for 25-50 per cent of the total light contained in the BCG and intra-cluster light), 
while models compute total magnitudes. Semi-analytic models do not provide informa- 
tion regarding the spatial distribution of the BCG light, so aperture magnitudes cannot 
be calculated. In addition, most of the available models do not take into account the 
stripping of stars from other cluster galaxies due to tidal and harassment effects [|27l l28ll. 

Most of the models currently available exhibit a remarkable degree of agreement with 
a large number of observations for the galaxy population in the local Universe (e.g. the 
observed relations between stellar mass, gas mass, and metallicity; the observed lumi- 
nosity, colour, and morphology distribution; the observed two-point correlation func- 
tions). When analysed in detail, however, some of these comparisons show important 
and systematic (i.e. common to most of the semi-analytic models discussed in the liter- 
ature) disagreements. 

Although models are not usually tuned to match observations of galaxy clustering, 
they generally reproduce the observed dependence of clustering on magnitude or colour. 
The agreement appears particularly good for the dependence on luminosity, while the 
amplitude difference on colour appears greater in the models than observed 11291 . 13011 . 
This problem might be (at least in part) related to the excess of small red satellite galaxies 
which plagues all models discussed in the recent literature (e.g. see Fig. 11 in 01911 and 
Fig. 4 in 13111 : see also Monaco these proceedings). 

A generic excess of intermediate to low-mass galaxies has been discussed by Fontana 
et al. [I32I1 . At low redshift, this excess is largely due to satellite galaxies that were formed 
and accreted early on, and that are dominated by old stellar populations. Semi-analytic 
models assume that when a galaxy is accreted onto a larger structure, the gas supply 
can no longer be replenished by cooling that is suppressed by an instantaneous and 
complete stripping of the hot gas reservoir. Since this process (commonly referred to 
as 'strangulation') is usually combined with a relatively efficient supemovae feedback, 
galaxies that are accreted onto a larger system consume their gas very rapidly, moving 
onto the red- sequence on quite short time- scales [|33l . . This contributes to produce 

an excess of faint and red satellites and a transition region (sometimes referred to as 
'green valley') which does not appear to be as well populated as observed. 

Much effort has been recently devoted to this problem. McCarthy et al. fsd] have 
used high resolution hydrodynamic simulations to show that galaxies are able to retain 
a significant fraction of their hot haloes following virial crossing. Font et al. [JtI ] 
incorporated a simple model based on these simulations within the Durham semi- 
analytic model. With this modification, a larger fraction of satellites has bluer colours, 
resulting in a colour distribution that is in better (but not perfect) agreement with 
observational data. 

The completion of new high-redshift surveys has recently pushed comparisons be- 
tween model results and observational data to higher redshift. I do not have time here 
to discuss in detail all agreements and disagreements discussed in the recent literature. I 
would like to stress, however, that this still rather unexplored regime for modern models 
is very interesting because it is at high redshift that predictions from different models 
differ more dramatically. 



To close this section, I would like to remind that a long standing problem for hierar- 
chical models has been to match the zero-point of the TuUy-Fisher relation (the observed 
correlation between the rotation speed and the luminosity of spiral disks [38]) while re- 
producing at the same time the observed luminosity function. As discussed in Baugh 
[|39ll . no model with a realistic calculation of galaxy size has been able to match the 
zero-point of the TuUy-Fisher relation using the circular velocity of the disk measured at 
the half mass radius. It remains unclear if this difficulty is related to some approximation 
in the size calculation, or if it is related to more fundamental shortcomings of the cold 
dark matter model. 



CONCLUSIONS 

Given our poor knowledge of most of the physical processes at play, ab initio treatments 
of the galaxy formation process are extremely difficult, if not unfeasible. In the past 
decades, we have developed a number of techniques to study galaxy formation within 
the currently standard cosmological model. Semi-analytic models represent the most 
developed of these techniques to make detailed predictions of galaxy properties at 
different cosmic epochs. 

These models are not meant to be definitive. Rather, they need to he falsified against 
observational data, in order to gain insight on the relative importance of different physi- 
cal processes, and on the physics which is eventually still missing in the models. When 
comparing model results with observational data, it is important to take into account 
observational errors and biases which are eventually introduced by a particular observa- 
tional selection and/or strategy. To this aim, realistic mock catalogues can be constructed 
by coupling semi-analytic techniques with large cosmological N-body simulations. Re- 
cently, a number of model results have been made publicly available in the context of the 
modem concept of 'Theoretical Virtual Observatory^. Considerable interest has been 
shown by the astronomical community, and a large number of papers using the public 
database have already been published, resulting in a rapid refinement and verification of 
theoretical modelling. 

The largest success of these techniques is to have shown that we can study galaxy 
formation within the currently established hierarchical paradigm. The largest failure is, 
unsurprisingly, that we have not yet solved the galaxy formation problem. Undoubtedly, 
however, we have learnt a great deal about how galaxies form and evolve, and how their 
physical properties are related to the dark matter haloes in which they reside. 
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